With the rise of Web 1.0 and Web 2.0, technology has brought the world into the palm of our hands. We the consumers have a plethora of options in almost every domain of needs – e-commerce, health care, finance and many more. Hence, it has become very critical for providers to receive feedback from customers to make their experience better and achieve customer loyalty. Every day we express our likes, dislikes, and opinions on different social media platforms, websites, and feedback channels. In traditional quality improvement efforts, companies use root cause analysis and data analytics tools. Recently there have been some developments around using topic modeling, sentiment analysis, and opinion mining to extract hidden information from customer feedback data. However, these methods fail to capture customers’ perceptions at a granular level. Joint topic-sentiment models can offer that which is lacked by those traditional methods.
Figure 1: Sample Output from typical Topic Models and Sentiment Analysis
Figure 2: Expected Output from Joint Topic-Sentiment Model
Topic modeling is an unsupervised natural language processing (NLP) technique that learns latent semantic topical representation from text corpus. Topic models use co-occurrences of words in different texts to capture the relationship between words. They learn a probabilistic representation of topics in terms of words and probabilistic representation of text data in terms of topics. For example, from a corpus of news text data, topic model can identify key topics like “sports”, “politics”, “entertainment”, “international” etc. Each topic is characterized by a set of words with corresponding weights. The topic “sports” will have higher weight for sports-related words like – “baseball”, “soccer” etc. than other words. Similarly, “politics” will have higher weight on “policy”, “government”, “election” etc.
Figure 3a: Sample Topic Model Output for Text 1
Figure 3b: Sample Topic Model Output for Text 2
In the above two images, we can see how a topic model can assign different topics into two different texts based on the words they contain.
Sentiment analysis, aka polarity identification, emotion classification is the task for understanding opinions expressed in a text. In sentiment analysis we typically classify a text in “positive”, “neutral” or “negative” classes. Sentiment analysis can be performed in both a supervised and unsupervised manner. In supervised learning, the luxury of having text data with manual labels can be used to calculate sentiment classes for unseen data. In unsupervised learning, however, we use external knowledge in terms of sentiment word lexicon and linguistic knowledge to calculate sentiment.
For example, given the text “he is an excellent writer”, unsupervised methods can classify it as “positive”.
Figure 4: Sample Word Sentiment Lexicon (1 means positive, 0 means negative)
Joint topic-sentiment model or, JST, is a class of unsupervised techniques that use the best of both worlds. Like traditional topic models (LDA and co.), JST models use a probabilistic graphical generative approach. Particularly in JST, we assume –
Every document is a mixture of topic and sentiment
Every topic can be represented as a mixture of words and sentiment
In a typical generative model, we sample the variables (the latent attributes we are trying to learn – topics, topic-sentiment, etc.) and based on the probabilities of the samples, we calculate the likelihood of generating a text by multiplying the likely values of all the words contained in the text. During the learning process, we maximize the likelihood value and achieve the optimal values of the latent parameters. Different techniques like EM algorithm, Variational Inference, and Gibbs Sampling are used for the training process.
In JST, latent (learnable) parameters are topic distribution for each sentiment class for each document, sentiment distribution for each document and word distribution for each topic-sentiment pair. In contrast, LDA (Latent Dirichlet Allocation) learns only topic distribution for each document and word distribution for each topic.
Figure 5a: LDA Model
Figure 5b: JST Model
In LDA (Figure 5a) θ is the topic distribution for each document; φ is the topic distribution for each document; T is the number of topics; D is the number of documents; α, β are additional hyperparameters.
In JST (Figure 5b) θ is the topic distribution for each sentiment class and document; φ is the word distribution for each topic-sentiment pair; π is the sentiment distribution for each document; α, β, γ, and λ are hyperparameters.
The expected output from the JST model is –
Figure 6: JST Output
From the above output, we can extract the key areas (“camera” and “price”) where the customer is having a positive opinion and also the key pain point areas (“display”) with a probability of 60%. Additionally, the model was also able to identify the overall “positive” sentiment with 65% probability. This information can be leveraged to understand the user’s preference and relative importance placed on different features of a product/service. This will allow companies to cater to personalized requests and help them to engage their customers better.
Reference
http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
A Comparative Study of Bayesian Models for Unsupervised Sentiment Detection